Behavioral Simulations in MapReduce 



Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, 
Alan Demers, Johannes Gehrke, Walker White 

Cornell University 
Ithaca, NY 14853, USA 

{guoz, vmarcos, sowell, tuancao, ademers, Johannes, wmwhite}@cs.cornell.edu 

{xw239}(a)cornell.edu 



O 
(N 



O 
(N 



q 

o 



> 

m 

en 

in 

o 
o 



'i 



ABSTRACT 

In many scientific domains, researchers are turning to large-scale 
behavioral simulations to better understand important real-world 
phenomena. While there has been a great deal of work on sim- 
ulation tools from the high-performance computing community, 
behavioral simulations remain challenging to program and auto- 
matically scale in parallel environments. In this paper we present 
BRACE (Big Red Agent-based Computation Engine), which ex- 
tends the MapReduce framework to process these simulations ef- 
ficiently across a cluster. We can leverage spatial locality to treat 
behavioral simulations as iterated spatial joins and greatly reduce 
the communication between nodes. In our experiments we achieve 
nearly linear scale-up on several realistic simulations. 

Though processing behavioral simulations in parallel as iterated 
spatial joins can be very efficient, it can be much simpler for the do- 
main scientists to program the behavior of a single agent. Further- 
more, many simulations include a considerable amount of complex 
computation and message passing between agents, which makes 
it important to optimize the performance of a single node and the 
communication across nodes. To address both of these challenges, 
BRACE includes a high-level language called BRASIL (the Big 
Red Agent Simulation Language). BRASIL has object oriented 
features for programming simulations, but can be compiled to a 
data-flow representation for automatic parallelization and optimiza- 
tion. We show that by using various optimization techniques, we 
can achieve both scalability and single-node performance similar 
to that of a hand-coded simulation. 



1. INTRODUCTION 

Behavioral simulations, also called agent-based simulations, are 
instrumental in tackling the ecological and infrastructure chal- 
lenges of our society. These simulations allow scientists to under- 
stand large complex systems such as transportation networks, in- 
sect swarms, or fish schools by modeling the behavior of millions 
of individual agents inside the system |[5][T0][T2). 

For example, transportation simulations are being used to ad- 
dress traffic congestion by evaluating proposed traffic management 
systems before implementing them 1 10|. This is a tremendously 
important problem as traffic congestion cost $87.2 billion and re- 
quired 2.8 billion gallons of extra fuel and 4.2 billion hours of extra 
time in the U.S. in 2007 alone [38J . Scientists also use behavioral 
simulations to model collective animal motion, such as that of lo- 
cust swarms or fish schools (5] [12). Understanding these phenom- 
ena is crucial, as they directly affect human food security |22|. 

Despite their huge importance, it remains difficult to develop 
large-scale behavioral simulations. Current systems either offer 



high-level programming abstractions, but are not scalable (20] [29] 
[32| , or achieve scalability by hand-coding particular simulation 
models using low-level parallel frameworks, such as MPI |43 1. 

This paper proposes to close this gap by bringing database- style 
programmability and scalability to agent-based simulations. Our 
core insight is that behavioral simulations may be regarded as com- 
putations driven by large iterated spatial joins. We introduce a new 
simulation engine, called BRACE (Big Red Agent-based Compu- 
tation Engine), that extends the popular MapReduce dataflow pro- 
gramming model to these iterated computations. BRACE embod- 
ies a high-level programming language called BRASIL, which is 
compiled into an optimized shared-nothing, in-memory MapRe- 
duce runtime. The design of BRACE is motivated by the require- 
ments of behavioral simulations, explained below. 

1.1 Requirements for Simulation Platforms 

We have identified several key features that are necessary for a 
generic behavioral simulation platform. 

(1) Support for Complex Agent Interaction. Behavioral simula- 
tions include frequent local interactions between agents. In partic- 
ular, agents may affect the behavior decisions of other agents, and 
multiple agents may issue concurrent writes to the same agent. A 
simulation framework should support a high degree of agent inter- 
action without excessive synchronization or rollbacks. This pre- 
cludes discrete event simulation engines or other approaches based 
on task parallelism and asynchronous message exchange. 

(2) Automatic Scalability. Scientists need to scale their simula- 
tions to millions or billions of agents to accurately model phenom- 
ena such as city- wide traffic or swarms of insects |[5][T2][l9). These 
scales make it essential to use data parallelism to distribute agents 
across many nodes. This is complicated by the interaction between 
agents, which may require communication between several nodes. 
Rather than requiring scientists to write complex and error-prone 
parallel code, the platform should automatically distribute agents 
to achieve scalability. 

(3) High Performance. Behavioral simulations are often ex- 
tremely complex, involving sophisticated numerical computations 
and elaborate decision procedures. Much existing work on behav- 
ioral simulations is from the high-performance computing commu- 
nity, and they frequently resort to hand-coding specific simulations 
in a low-level language to achieve acceptable performance 1 19 33]. 
A general purpose framework must be competitive with these hand- 
coded applications in order to gain acceptance. 

(4) Commodity Hardware. Historically, many scientists have 
used large shared-memory supercomputer systems for their sim- 
ulations. Such machines are tremendously expensive, and cannot 
scale beyond their original capacity. We believe that the next gen- 



eration of simulation platforms will target shared-nothing systems 
and will be deployed on local clusters or in the cloud on services 
like Amazon's EC2 |1|. 

(5) Simple Programming Model. Domain scientists have shown 
their willingness to try simulation platforms that provide simple, 
high-level programming abstractions, even at some cost in perfor- 
mance and scalability |20 29 32 1. Nevertheless, a behavioral sim- 
ulation framework should provide an expressive and high-level pro- 
gramming model without sacrificing performance. 

1.2 Contributions 

We begin our presentation in Section [2?T] by describing impor- 
tant properties of behavioral simulations that we leverage in our 
platform. We then move on to the main contributions of this paper: 

• We show how MapReduce can be used to scale behavioral 
simulations across clusters. We abstract these simulations 
as iterated spatial joins and introduce a new main memory 
MapReduce runtime that incorporates optimizations moti- 
vated by the spatial properties of simulations (Section[3]). 

• We present a new scripting language for simulations that 
compiles into our MapReduce framework and allows for al- 
gebraic optimizations in mappers and reducers. This lan- 
guage hides all the complexities of modeling computations 
in MapReduce and parallel programming from domain sci- 
entists (Section[4|. 

• We perform an experimental evaluation with two real- world 
behavioral simulations that shows our system has nearly lin- 
ear scale-up and single-node performance that is comparable 
to a hand-coded simulation. (Section [5]). 

We review related work in Section|6]and conclude in Section[7] 

2. BACKGROUND 

In this section, we introduce some important properties of behav- 
ioral simulations and review the MapReduce programing model. In 
the next section we exploit these properties to abstract the compu- 
tations in the behavioral simulations and use MapReduce to effi- 
ciently process them. 

2.1 Properties of Behavioral Simulations 

Behavioral simulations model large numbers of individual agents 
that interact in a complex environment. Unlike scientific simu- 
lations that can be modeled as systems of equations, agents in a 
behavioral simulation can execute complex programs that include 
non- trivial control flow. Nevertheless, most behavioral simulations 
have a similar structure, which we introduce below. We will use 
a traffic simulation |47 1 and a fish school simulation (V2\ as run- 
ning examples. Details on these simulation models can be found in 
Appendix [C] 

The State-Effect Pattern. Most behavioral simulations use a time- 
stepped model in which time is discretized into "ticks" that repre- 
sent the smallest time period of interest. Events that occur during 
the same tick are treated as simultaneous and can be reordered or 
parallelized. This means that an agent's decisions cannot be based 
on previous actions made during the same tick. An agent can only 
read the state of the world as of the previous tick. For example, in 
the traffic simulation, each car inspects the positions and velocities 
of other cars as of the beginning of the tick in order to make lane 
changing decisions. 

In previous work on scaling computer games, we proposed a 
model for this kind of time- stepped behavior called the state -effect 
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pattern | [44l[45| . The basic idea is to separate read and write op- 
erations in order to limit the synchronization necessary between 
agents. In the state-effect pattern, the attributes of an agent are 
separated into states and effects, where states are public attributes 
that are updated only at tick boundaries, and effects are used for 
intermediate computations as agents interact. Since state attributes 
remain fixed during a tick, they only need to be synchronized at the 
end of each tick. Furthermore, each effect attribute has an associ- 
ated decomposable and order-independent combinator function for 
combining multiple assignments during a tick. This allows us to 
compute effects in parallel and combine the results without worry- 
ing about concurrent writes. For example, the fish school simula- 
tion uses vector addition so that each fish may combine the orien- 
tation of nearby fish into an effect attribute. Since vector addition 
is commutative, we can process these assignments in any order. 

In the state-effect pattern, each tick is divided 
into two phases: the query phase and the update 
phase, as shown in the figure on the right. In the 
query phase, each agent queries the state of the 
world and assigns effect values, which are com- 
bined using the appropriate combinator function. 
To ensure the property that the actions during a tick 
are conceptually simultaneous, state variables are 
read-only during the query phase and effect vari- 
ables are write-only. 

In the update phase, each agent can read its state attributes and 
the effect attributes computed from the query phase; it uses these 
values to compute the new state attributes for the next tick. In the 
fish school simulation, the orientation effects computed during the 
query phase are read during the update phase to compute a fish's 
new velocity vector, represented as a state attribute. In order to 
ensure that updates do not conflict, each agent can only read and 
write its own attributes during the update phase. Hence, the only 
way that agents can communicate is through effect assignments in 
the query phase. We classify effect assignments into local and non- 
local assignments. In a local assignment, an agent updates one of 
its own effect attributes; in a non-local assignment, an agent writes 
to an effect attribute of a different agent. 

The Neighborhood Property. The state-effect pattern can be used 
to limit the synchronization necessary during a tick, but it is still 
possible that every agent needs to query every other agent in the 
simulated world to compute its effects. We observe that this rarely 
occurs in practice. In particular, we observe that most behavioral 
simulations are eminently spatial, and simulated agents can only 
interact with other agents that are close according to a distance met- 
ric |27|. For example, a fish can only observe other fish within a 
limited distance p. 

2.2 MapReduce 

Since its introduction in 2004, MapReduce has become one of 
the most successful models for processing long running compu- 
tations in distributed shared-nothing environments! 15] . While it 
was originally designed for very large batch computations, MapRe- 
duce is ideal for behavioral simulations because it provides auto- 
matic scalability, which is one of the key requirements for next- 
generation platforms. By varying the degree of data partitioning 
and the corresponding number of map and reduce tasks, the same 
MapReduce program can be run on one machine or one thousand. 

The MapReduce programming model is based on two functional 
programming primitives that operate on key- value pairs. The map 
function takes a key- value pair and produces a set of intermediate 
key-value pairs, map : (^i,vi) -^ [(^25^2)]^ while the reduce func- 
tion collects all of the intermediate pairs with the same key and 
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Table 1: The state-effect pattern in MapReduce. 

produces a value, reduce : (^2 5 [^2]) -^ [^3]- Since simulations con- 
sist of many ticks, we will use an iterative MapReduce model in 
which the output of the reduce step is fed into the next map step. 
Formally, this means that we change the output of the reduce step 

tobe[(B,v3)]. 



3. MAPREDUCE FOR SIMULATIONS 

In this section, we abstract behavioral simulations as computa- 
tions driven by iterated spatial joins and show how they can be 
expressed in the MapReduce framework (Section [3T| l. We then 
propose a system called BRACE to process these joins efficiently 
(SectionlTsl. 



3.1 Simulations as Iterated Spatial Joins 

In SectionjTT] we observed that behavioral simulations have two 
important properties: the state-effect pattern and the neighborhood 
property. The state-effect pattern essentially characterizes behav- 
ioral simulations as iterated computations with two phases: a query 
phase in which agents inspect their environment to compute effects, 
and an update phase in which agents update their own state. 

The neighborhood property introduces two important restrictions 
on each of these phases, visibility and reachability. We say that the 
visible region of an agent a is the region of space containing agents 
that a can read from or assign effects to. Agent a needs access 
to all the agents in its visible region to compute its query phase. 
Thus a simulation in which agents have small visible regions re- 
quires less communication than one with very large or unbounded 
visible regions. Similarly, we can define an agent's reachable re- 
gion as the region that the agent can move to after the update phase. 
This is essentially a measure of how much the spatial distribution 
of agents can change between ticks. When agents have small reach- 
able regions, a spatial partitioning of the agents is likely to remain 
balanced for several ticks. Frequently an agent's reachable region 
will be a subset of its visible region (an agent can't move farther 
than it can see), but this is not required. 

We observe that since agents only query other agents within their 
visible regions, processing a tick is similar to a spatial self-join 
from the database literature 1 30|. We join each agent with the set of 
agents in its visible region and perform the query phase using only 
these agents. During the update phase, agents move to new posi- 
tions within their reachable regions and we perform a new iteration 
of the join during the next tick. We will use these observations 
to parallelize behavioral simulations efficiently in the MapReduce 
framework. 

3.2 Iterated Spatial Joins in MapReduce 

In this section, we show how to model spatial joins in MapRe- 
duce. A formal version of this model is included in Appendix [A| 
MapReduce has often been criticized for being inefficient at pro- 
cessing joins |46| and also inadequate for iterative computations 
without modification 1 17 1. However, the spatial properties of sim- 
ulations will allow us to process them effectively without excessive 
communication. Our basic strategy is to use a technique presented 
by Zhang et al. to compute a spatial join in MapReduce |49|. Each 
map task is responsible for spatially partitioning agents into a num- 
ber of disjoint regions, and the reduce tasks join the agents using 



their visible regions. 

The map tasks use a spatial partitioning function to assign each 
agent to a disjoint region of space. This function might divide the 
space into a regular grid or might perform some more sophisticated 
spatial decomposition. Each reducer will process one such parti- 
tion. The set of agents assigned to a particular partition is called 
that partition's owned set. Note that we cannot process each parti- 
tion completely independently because each agent needs access to 
its visible region, and this region may intersect several partitions. 
To address this, we can define the visible region of a partition as the 
region of space visible to some point in the partition. The map task 
will then replicate each agent a to every partition that contains a in 
its visible region. 

Table [T] shows how the phases of the state-effect pattern are as- 
signed to map and reduce tasks. For simulations with only local 
effect assignments, a tick t begins when the first map task, map^^, 
assigns each agent to a partition (distribute^. Each reducer is as- 
signed a partition and receives every agent that falls within its 
owned set as well as replicas of every agent that falls within its 
visible region. These are exactly the agents necessary to process 
the query phase of the owned set (query^. The reducer, reduce^^, 
outputs a copy of each agent it owns after executing the query phase 
and updating the agent's effects. The tick ends when the next map 
task, map^^^ , executes the update phase (update^. 

While this two-step approach works well for simulations that 
have only local effects, it does not handle non-local effect assign- 
ments. Recall that a non-local effect is an effect assignment by 
some agent a to some other agent b within a's visible region. For 
example, if we were to extend the fish simulation to include preda- 
tors, then we would model a shark attack as a non-local effect as- 
signment from the shark to the fish. Non-local effects require com- 
munication during the query phase. We implement this communi- 
cation using two MapReduce passes, as illustrated in Table [T] The 
first map task, map^^, is the same as before. The first reduce task, 
reduce^, performs only effect assignments to its local replicas (lo- 
cal effect). These partially aggregated effect values are then dis- 
tributed to the partitions that own them, where they are combined 
by the second reduce task, reduce2. This computes the final value 
for each aggregate (global effect). As before, the update phase is 
processed in the next map task, map^^^^ Note that the second map 
task, map2, is only necessary for distribution, but does not perform 
any computation and can be eliminated in an implementation. We 
call this model map-reduce-reduce. 

Our map-reduce-reduce model relies heavily on the neighbor- 
hood property. The number of replicas that each map task must 
create depends on the size of the agent's visible regions, and the 
frequency with the agents change partitions depends on the size of 
their reachable regions. 



3.3 The BRACE MapReduce Runtime 

In this section we describe a MapReduce implementation that 
takes advantage of the state-effect pattern and the neighborhood 
property. We introduce BRACE, the Big Red Agent Computation 
Engine, our platform for scalable behavioral simulations. BRACE 
includes a MapReduce runtime especially optimized for the iter- 
ated spatial joins discussed in Section [TT] We have developed a 
new system rather than using an existing MapReduce implemen- 
tation such as Hadoop |23 1 because behavioral simulations have 
considerably different characteristics than traditional MapReduce 
applications such as search log analysis. The goal of BRACE is to 
process a very large number of ticks efficiently, and to avoid I/O 
or communication overhead while providing features such as fault 
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tolerance. We describe the main features of our runtime below. 
Shared-Nothing, Main-Memory Architecture. In behavioral 
simulations, we expect data volumes to be modest, so BRACE exe- 
cutes map and reduce tasks entirely in main memory. For example, 
a simulation with one million agents whose state and effect fields 
occupy 1 KB on average requires roughly 1 GB of main memory. 
Even larger simulations with orders of magnitude more agents will 
still fit in the aggregate main memory of a cluster. Since the query 
phase is computationally expensive, partition sizes are limited by 
CPU cycles rather than main memory size. 

Figure IT] shows the architecture of BRACE. As in typical 
MapReduce implementations, a master node is responsible for 
cluster coordination. Unlike in traditional MapReduce runtimes, 
BRACE'S master node only interacts with worker nodes every 
epoch, which corresponds to a fixed number of ticks. The intu- 
ition is that iterations will be quickly processed in main memory 
by the workers, so we wish to amortize the overheads related to 
fault tolerance and load balancing. In addition, we carefully allo- 
cate tasks of map-reduce-reduce iterations to workers, so that we 
diminish communication overheads within and across iterations. 
Fault Tolerance. Traditional MapReduce runtimes provide fault 
tolerance by storing output to a replicated file system and automat- 
ically restarting failed tasks. Since we expect ticks to be quite short 
and they are processed in main memory, it would be prohibitively 
expensive to write output to stable storage between every tick. Fur- 
thermore, since individual ticks are short, the benefit from restart- 
ing a task is likely to be small. 

We employ epoch synchronization with the master to trigger co- 
ordinated checkpoints jTS) of the main memory of the workers. As 
the master determines a pre-defined tick boundary for checkpoint- 
ing, the workers can write their checkpoints independently without 
global synchronization. As we expect iterations to be short, failures 
are handled by re-execution of all iterations since the last check- 
point, a common technique in scientific simulations. In fact, we can 
leverage previous literature to tune the checkpointing interval to 
minimize the total expected runtime of the whole computation ||T3|. 
Partitioning and Load Balancing. As we have observed in Sec- 
tion |3.2| bounded reachability implies that a given spatial partition- 
ing will remain effective for a number of map-reduce-reduce iter- 
ations. Our runtime uses that observation to keep data partitioning 
stable over time and re-evaluates it at epoch boundaries. 

At the beginning of the simulation, the master computes a parti- 
tioning function based on the visible regions of the agents and then 
broadcasts this partitioning to the worker nodes. Each worker be- 
comes responsible for one region of the partitioning. While agents 
change partitions slowly, over time the overall spatial distribution 
may change quite dramatically. For example, the distribution of 
traffic on a road network is likely to be very different at morn- 
ing rush hour than at evening rush hour. This would cause certain 
nodes to become overloaded if we used the same partitioning in 



both cases. To address this, the master periodically requests statis- 
tics from the workers about the number of agents in the owned 
region and the communication and processing costs. The master 
then decides on repartitioning by balancing the cost of redistribu- 
tion with its expected benefit. If the master decides to modify the 
partitioning, it broadcasts the new partitioning to all workers. The 
workers then switch to the new partitioning at a specified epoch 
boundary. 

Collocation of Tasks. Since simulations run for many iterations, 
it is important to avoid unnecessary communication between map 
and reduce tasks. We accomplish this by collocating the map and 
reduce tasks for a tick on the same node so that agents that do 
not switch partitions can be sent between tasks via fast memory 
rather than the network. Since agents have limited reachable re- 
gions, the owned set of each partition is likely to remain relatively 
stable across ticks, and so will remain on the same node. Agents 
still need to be replicated, but their original copies do not have to 
be redistributed. This idea was previously explored by the Phoenix 
project for SMP systems | |48j and the Map-Reduce-Merge project 
for individual joins ||46), but it is particularly important for long- 
running behavioral simulations. 

Figure [T] shows how collocation works when we allow non-local 
effect assignments. Solid arrows indicate the flow of agents during 
a tick. Each node processes a map task and two reduce tasks as 
described in Section [TT] The map task replicates agents as appro- 
priate and sends all of the agents that remain in the same partition 
to the reduce task on the same node. The first reducer computes 
local effects and sends any updated replicas to the second reduce 
phase at other nodes. The final reducer computes the final effects 
and sends them to the map task on the same node. Because of the 
neighborhood property, many agents will be processed on the same 
node during the next tick. 

4. PROGRAMMING AGENT BEHAVIOR 

In this section, we show how to offer a simple programming 
model for a domain scientist, targeting the last requirement of Sec- 
tion |l.l| MapReduce is set-based; a program describes how to pro- 
cess all of the elements in a collection. Simulation developers pre- 
fer to describe the behavior of their agents individually, and use 
message-passing techniques to communicate between agents. This 
type of programming is closer to the scientific models that describe 
agent behavior. 

We introduce a new programming language - BRASIL, the Big 
Red Agent Simulation Language. BRASIL embodies agent cen- 
tric programming with explicit support for the state-effect pattern, 
and performs further algebraic optimizations. It bridges the men- 
tal model of simulation developers and our MapReduce processing 
techniques for behavioral simulations. We provide an overview of 
the main features of BRASIL (Section [4T| and describe algebraic 
optimization techniques that can be applied to our scripts (Sec- 
tion [42]). Formal semantics for our language as well as the proofs 
of theorems in this section are provided in Appendix IbTI 



4.1 Overview of BRASIL 

BRASIL is an object-oriented language in which each object cor- 
responds to an agent in the simulation. Agents in BRASIL are de- 
fined in a class file that looks superficially similar to Java. The pro- 
grammer can specify fields, methods, and constructors, and these 
can each either be public or private. Unlike in Java, however, each 
field in a BRASIL class must be tagged as either state or effect. 
The BRASIL compiler then enforces the read-write restrictions of 
the state-effect pattern over those fields as described in SectionjTT] 
Figure [2] illustrates an example of a simple two-dimensional fish 



class Fish { 

// The fish location 
public state float x 
public state float y 



(x+vx) ; #range [-1, 1] ; 
(y+vy) ; #range [-1, 1] ; 



// The latest fish velocity 

public state float vx : vx + randO + avoidx / count * vx; 

public state float vy : vy + rand() + avoidy / count * vy; 

// Used to update our velocity 
private effect float avoidx : sum; 
private effect float avoidy : sum; 
private effect int count : sum; 

/** The query-phase for this fish. */ 
public void run () { 

// Use "forces" to repel fish too close 
foreach(Fish p : Extent<Fish>) { 
p. avoidx <- 1 / abs (x - p.x); 
p. avoidy <- 1 / abs (y - p.y); 
p. count <- 1; 



}}} 



Figure 2: Class for Simple Fish Behavior 



simulation; in this simulation, the fish swim about randomly, but 
avoid each other through the use of imaginary repulsion "forces". 

Recall that the state-effect pattern divides the computation into 
a query and an update phase. In BRASIL, the query phase for an 
agent class is expressed by its run ( ) method. State fields remain 
read-only and all effect assignments are aggregated using the aggre- 
gate function specified at the effect field declaration. Effect fields 
are similar to aggregator variables in Sawzall |36|; indeed, we use 
the Sawzall operator <- for writing to effect fields. In our fish 
simulation, for example, each fish repels nearby fish via a "force" 
inversely proportional to the distance between them. The update 
phase is specified as a collection of update rules attached to each 
state field. These rules can only read values of other fields in this 
agent. In our example, fish velocity vectors get updated based on 
the avoidance factors and then perturbed by a random amount. 

There are some important restrictions in BRASIL' s program- 
ming constructs. First, BRASIL only supports iteration over a set 
or list via a f oreach-loop. This eliminates arbitrary looping, which 
is not available in algebraic database languages. Second, there is an 
interplay between f oreach-loops and effects: effect variables can 
only be read outside of a f oreach-loop, and all assignments within 
a f oreach-loop are aggregated. This powerful restriction allows us 
to treat the entire program, and not just the communication across 
map and reduce operations, as a data-flow query plan. 

BRASIL also has a special programming construct to enforce the 
neighborhood property outlined in Section |2.1| Every state field 
that encodes spatial location may be tagged with a visibility and 
reachability constraint. While it is possible to generalize this con- 
cept to arbitrary constraints, in our current implementation the con- 
straints are (hyper)rectangles. For example, the constraint attached 
to the X field in Figure [2] means that [—1,1] is the interval that this 
fish can inspect or move with respect to the x coordinate. In our fish 
example, the f oreach-loop will therefore only be able to affect fish 
within this range. In addition, the update rule is guaranteed to crop 
any changes to the x coordinate to at most one unit. 

Note that visibility has an interplay with agent references: it is 
possible that a reference to another agent is fine initially, but vio- 
lates the visibility constraint as that other agent moves relative to 
the one holding the reference. For that reason, BRASIL employs 
weak reference semantics for agent references, similar to weak ref- 
erences in Java. If another agent moves outside of the visible re- 
gion, then all references to it will resolve to NIL. 

Note that this gives a different semantics for visibility than the 



one present in Section [3] BRASIL uses visibility to determine 
how agent references are resolved, while the BRACE runtime uses 
visibility to determine agent replication and communication. The 
BRASIL semantics are preferable for a developer, because they are 
easy to understand and hide MapReduce details. Fortunately, as we 
prove formally in Appendix IrT] these are equivalent. 



Theorem [T] The BRASIL semantics for visibility and the 
BRACE implementation of visibility are equivalent. 

While programming features in BRASIL may seem unusual, ev- 
erything in the language follows from the state-effect pattern and 
neighborhood property. As these are natural properties of behav- 
ioral simulations, programming these simulations becomes rela- 
tively straightforward. Indeed, a large part of our traffic simulation 
in Section [5?7] was implemented by a domain scientist. 

4.2 Optimization 

We compile BRASIL into a well-understood data-flow language. 
In our previous work on computer games, we used the relational 
algebra to represent our data flow | [44| . However, for distributed 
simulations, we have found the monad algebra (6][28][35]|4T) - the 
theoretical foundation for XQuery J28) - to be a much more ap- 
propriate fit. In particular, the monad algebra has a MAP primitive 
for descending into the components of its nested data model; this 
makes it a much more natural companion to MapReduce than the 
relational algebra. 

We present the formal translation to the monad algebra in Ap- 
pendix [B] together with several theorems regarding its usage in 
optimization. Most of these optimizations are the same as those 
that would be present in a relational algebra query plan: algebraic 
rewrites and automatic indexing. In fact, any monad algebra ex- 
pression on flat tables can be converted to an equivalent relational 
algebra expression and vice versa |35|; rewrites and indexing on 
the relational form carry back into the monad algebra form. In par- 
ticular, many of the techniques used by Pathfinder 1 42 1 to process 
XQuery with relational query plans apply to the monad algebra. 
Effect Inversion. An important optimization that is unique to our 
framework involves the assignment of non-local effects. If non- 
local effect assignments can be eliminated, then we are able to 
process our MapReduce computations with one MapReduce pass 
instead of two (SectionjS]). Consider again the program of Figure[2] 
We may rewrite its f oreach-loop as 

foreach(Fish p : Extent<Fish>) { 
avoidx <- 1 / abs (p.x - x) ; 
avoidy <- 1 / abs (p.y - y) ; 
count <- 1; 

} 
This rewritten expression does not change the results of the simu- 
lation, but only assigns effects locally. 

There are two main results regarding effect inversion. While we 
prove them formally in Appendix |B.2[ we can state them informally 
here, and give some intuition regarding their usage. 

Theorem |2] Effect inversion is always possible if there are no 
visibility constraints. 

When there are no visibility constraints, each agent can read any 
other agent. Hence, we can produce a script where an agent sim- 
ulates the behavior of other agents, checks for effects that are as- 
signed to itself, and then assign them locally. Unfortunately, this 
new script is clearly much more computationally expensive. Hence, 
to be useful in practice, we need to use other optimization tech- 
niques to simplify the new script. 

Effect inversion is not always possible when the simulation has 
visibility constraints. Intuitively, an agent may use non-local effects 



to act as a communication proxy between two other agents that are 
not visible to one another. However, the state-effect pattern ensures 
that an agent can only receive information from another agent if 
they have a third (not necessarily distinct) agent that is visible to 
both. This provides us with another result. 

Theorem [3] If the visibility constraint on a script is a distance 
bound, there is an equivalent script with a constraint at most twice 
that distance bound that has only local effect assignments. 

Increasing the visibility bound increases the number of replicas 
that have to store at each node. Hence this optimization eliminates 
the extra communication round at the cost of more information to 
be sent during the remaining communication round. 

5. EXPERIMENTS 

In this section, we present experimental results using two dis- 
tinct real-world behavioral simulation models we have coded us- 
ing BRACE. We focus on the following: (i) We validate the effec- 
tiveness of the BRASIL optimizations introduced in Section |4.2| 
In fact, these optimizations allow us to approach the efficiency 
of hand-optimized simulation code (Section [5^ ; (ii) We evalu- 
ate brace's MapReduce runtime implementation over a cluster 
of machines. We measure simulation scale-up via spatial data par- 
titioning as well as load balancing (Section [53]). 



5.1 Setup 

Implementation. The prototype BRACE MapReduce runtime is 
implemented in C-i~i- and uses MPI for inter-node communication. 
Our BRASIL compiler is written in Java and directly generates C-i~i- 
code that can be compiled with the runtime. 

Our prototype includes a generic KD-tree based spatial index ca- 
pability |3 1. We use a simple rectilinear grid partitioning scheme, 
which assigns each grid cell to a separate slave node, A one- 
dimensional load balancer periodically receives statistics from the 
slave nodes, including computational load and number of owned 
agents; from these it heuristically computes a new partition trying 
to balance improved performance against estimated migration cost. 
Checkpointing is not yet integrated into BRACE's implementation. 
We believe this is not a problem for the cluster sizes we evaluate, 
given the low likelihood of worker failure during a computation. 

We plan to integrate more sophisticated algorithms for all these 
components in future work. But our current prototype already 
demonstrates good performance and automatic scaling of realistic 
behavioral simulations written in BRASIL. 

Simulation Workloads. We have implemented realistic traffic and 
fish school simulations in BRASIL. The traffic simulation includes 
the lane-changing and acceleration models of the state-of-the-art, 
open-source MITSIM traffic simulator |47|. MITSIM is a single- 
node program, so we compare its performance against our BRASIL 
reimplementation of its model also running on a single node. We 
simulate a linear segment of highway, and scale-up the size of the 
problem by extending the length of the segment. 

The fish simulation implements a recent model of information 
flow in animal groups 1 12 1. In this model the "ocean" is unbounded, 
and the spatial distribution of fish changes dramatically as "in- 
formed individuals" guide the movements of others in the school. 

Neither of these simulations uses non-local effect assignments; 
therefore we need only a single reducer per node. To evaluate our 
effect inversion optimization, we modified the fish simulation to 
create a predator simulation that uses non-local assignments. It is 
similar in spirit to artificial society simulations J27). Appendix [C] 
describes these simulation models in more detail. We measure total 
simulation time in our single-node experiments and tick throughput 



(agent ticks per second) when scaling up over multiple nodes. In all 
measurements we eliminate start-up transients by discarding initial 
ticks until a stable tick rate is achieved. 

Hardware Setup. We ran all of our experiments in the Cornell 
Web Lab cluster |2|. The cluster contains 60 nodes interconnected 
by a pair of 1 gigabit/sec Port Summit X450a Ethernet Switches. 
Each node has two Quad Core Intel Xeon, 2.66GHz, processors 
with 4MB cache each and 16 GB of main memory. 

5.2 BRASIL Optimizations 

We first compare the single-node performance of our traffic sim- 
ulation to MITSIM. The main optimization in this case is spatial 
indexing. For a meaningful comparison, we validate the aggregate 
traffic statistics produced by our BRASIL reimplementation against 
those produced by MITSIM. Details of our validation procedure 
appear in Appendix [C] 

Figure |3] compares the performance of MITSIM against our 
BRACE reimplementation of its model. Without spatial indexing, 
BRACE'S performance degrades quadratically with increasing seg- 
ment length. This is expected: In this simulation, the number of 
agents grows linearly with segment length; and without indexing 
every vehicle enumerates and tests every other vehicle during each 
tick. With spatial indexing enabled, BRACE converts this behav- 
ior to an orthogonal range query, resulting in log-linear growth, 
as confirmed by Figure [3] BRACE's spatial indexing achieves 
performance that is comparable, but inferior to MITSIM 's hand- 
coded nearest-neighbor implementation. Our optimization tech- 
niques generalize to nearest-neighbor indexing, and adding this to 
BRACE is planned future work. With this enhancement, we expect 
to achieve performance parity with MITSIM. 

We observed similar log-linear versus quadratic performance 
when scaling up the number of agents in the fish simulation in a 
single node. We thus omit these results. When we increase the 
visibility range, however, the performance of the KD-tree indexing 
decreases, since more results are produced for each index probe 
(Figure |4]). Still, indexing yields from two to three times improve- 
ment over a range of visibility values. 

In addition to indexing, we also measure the performance gain of 
eliminating non-local effect assignments through effect inversion. 
Only the predator simulation has non-local effect assignments, so 
we report results exclusively on this model. We run two versions 
of the predator simulation, one with non-local assignments and the 
other with non-local assignments eliminaed by effect inversion. We 
run both scripts with and without KD-tree indexing enabled on 1 6 
slave nodes, and with BRACE configured to have two reduce passes 
in the first case and only a single reduce pass in the second case. 
Our results are displayed in Figure [5] Effect inversion increases 
agent tick throughput from 3.59 million (Idx-Only) to 4.36 million 
(Idx-i-Inv) with KD-tree indexing enabled, and from 2.95 million 
(No-Opt) to 3.63 million (Inv-Only) with KD-tree indexing dis- 
abled. This represents an improvement of more than 20% in each 
case, demonstrating the importance of this optimization. 

5.3 Scalability of the BRACE Runtime 

We now explore the parallel performance of BRACE's MapRe- 
duce runtime on the traffic and fish school simulations as we scale 
the number of slave nodes from 1 to 36. The size of both simula- 
tions is scaled linearly with the number of slaves, so we measure 
scale-up rather than speed-up. 

The traffic simulation represents a linear road segment with con- 
stant up-stream traffic. As a result, the distribution on the segment 
is nearly uniform, and load is always balanced among the nodes. 
Therefore, throughput grows linearly with the number of nodes 
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even if load balancing is disabled (Figure |6|. The sudden drop 
around 20 nodes is an artifact of the multi-switch architecture of the 
Web Lab cluster on which we ran our experiments: Performance 
degrades when not all nodes can be chosen on the same switch. 

In the fish simulation, fish move in schools led by informed indi- 
viduals |12|. In our experiment, there are two classes of informed 
individuals, trying to move in two different fixed directions. The 
spatial distribution of fish, and consequently the load on each slave 
node, changes over time. Figurejv] shows the scalability of this sim- 
ulation with and without load balancing. Without load balancing, 
two fish schools eventually form in nodes at the extremes of sim- 
ulated space, while the load at all other nodes falls to zero. With 
load balancing, partition grids are adjusted periodically to assign 
roughly the same number of fish to each node, so throughput in- 
creases linearly with the number of nodes. 

Figure[8]confirms this. With load balancing enabled, the time per 
simulation epoch is essentially flat; with load balancing disabled, 
the epoch time gradually increases to a value that reflects all agents 
being simulated by only two nodes. 

6. RELATED WORK 

Much of the existing work on behavioral simulations has focused 
on task-parallel discrete event simulation systems J8] [34] [14] |3T| 
[50) . Such systems employ either conservative or optimistic proto- 
cols to detect conflicts and preempt or rollback simulation tasks. 
The strength of local interactions and the time-stepped model used 
in behavioral simulations lead to unsatisfactory performance, as 
shown in attempts to adapt discrete event simulators to agent-based 
simulations |26,,251. 

Platforms specifically targeted at agent-based models have been 
developed, such as Swarm f32). Mason f29l, and Player/Stage |20|. 
These platforms offer tools to facilitate simulation programming, 
but most rely on message-passing abstractions with implementa- 
tions inspired by discrete event simulators, so they suffer in terms 
of performance and scalability. A few recent systems attempt 
to distribute agent-based simulations over multiple nodes with- 
out exploiting applications properties such as visibility and time- 
stepping 1 24 37 1 . This leads either to poor scale-up or to unrealistic 
restrictions on agent interactions. 



Regarding join processing with MapReduce, Zhang et al. f49l 
compute spatial joins by an approach similar to ours when only lo- 
cal effect assignments are allowed. Their mapper partitions are de- 
rived using spatial index techniques rather than by reasoning about 
the application program, and they do not discuss iterated joins, an 
important consideration for our work. Locality optimizations have 
been studied for MapReduce on SMPs |48| and for MapReduce- 
Merge |46|; in this paper we consider the problem in a distributed 
main memory MapReduce runtime. 

Data-driven parallelization techniques have also been studied 
in parallel databases |,16, ,21] and data parallel programming lan- 
guages [14j[39). However, it is unnatural and inefficient to use either 
SQL or set-operations exclusively to express flexible computation 
over individuals as required for behavioral simulations. 

Given this situation, behavioral simulation developers have re- 
sorted to hand-coding parallel implementations of specific simula- 
tion models 1 19 33], or trading model accuracy for scalability and 
ease of implementation |7 43]. To the best of our knowledge, our 
approach is the first to bring both programmability and scalability 
through data parallelism to behavioral simulations. 

7. CONCLUSIONS 

In this paper we show how MapReduce can be used to scale 
behavioral simulations across clusters by abstracting these sim- 
ulations as iterated spatial joins. To efficiently distribute these 
joins we leverage several properties of behavioral simulations to 
get a shared-nothing, in-memory MapReduce framework called 
BRACE, which exploits collocation of mappers and reducers to 
bound communication overhead. In addition, we present a new 
scripting language for our framework called BRASIL, which hides 
all the complexities of modeling computations in MapReduce and 
parallel programming from domain scientists. BRASIL scripts can 
be compiled into our MapReduce framework and allow for alge- 
braic optimizations in mappers and reducers. We perform an ex- 
perimental evaluation with two real-world behavioral simulations 
to show that BRACE has nearly linear scalability as well as single- 
node performance comparable to a hand-coded simulator. 
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APPENDIX 

A. SPATIAL JOINS IN MAPREDUCE 

In this appendix, we formally develop the map and reduce func- 
tions for processing a single tick of a behavioral simulation. 
Formalizing Agents and Spatial Partitioning. We first introduce 
our notation for agents and their state and effect attributes. We 
denote an agent asisa= (oid, s, e), where s is a vector of the agent's 
state attributes and e is a vector of its effects. To refer to an agent 
or its attributes at a tick t, we will write a^ s^ or e^ Since effect 
attributes are aggregated using combinator functions, they need to 
be reset at the end of every tick. We will use 9 to refer to the vector 
of idempotent values for each effect. Finally, we use to denote 
the aggregate operator that combines effect variables according to 
the appropriate combinator. 

The neighborhood property implies that some subset of each 
agent's state attributes are spatial attributes that determine an 
agent's position. For an agent a = (oid,s,e), we denote this spa- 
tial location i{s) G L, where L is the spatial domain. Given an 
agent a at location /, the visible region of a is VR{1) C L. 

Both the map and reduce tasks in our framework will have ac- 
cess to a spatial partitioning function P : X ^ P, where P is a set 
of partition ids. This partitioning function can be implemented in 
multiple ways, such as a regular grid or a quadtree. We define the 
owned set of a partition p as the inverse image of p under P, i.e., 
the set of all locations assigned to p. Since each location has an 
associated visible region, we can also define the visible region of a 
partition as VR{p) = [JieL,P(l)=p^^(0- This is the set of all loca- 
tions that might be visible by some agent in p. 
Simulations with Local Effects Only. Since the query phase of 
an agent can only depend on the agents inside its visible region, 
the visible region of a partition contains all of the data necessary to 
execute the query phase for its entire owned region. We will take 
advantage of this by replicating all of the agents in this region at p 
so that the query phase can be executed without communication. 

Figure |9] shows the map and reduce functions for processing tick 
t when there are only local effect assignments. At tick t, the map 
function performs the update phase from the previous tick, and the 
reduce function performs the query phase. The map function takes 
as input an agent with state and effect variables from the previous 
tick (a^~^), and updates the state variables to s^ and the effect at- 
tributes to 9. During the very first tick of the simulation, e^~^ is 
undefined, so s^ will be set to a value reflecting the initial simu- 
lation state. The map function emits a copy of the updated agent 
keyed by partition for each partition containing the agent in its vis- 
ible set (i{s^) G VR{p)). This has the effect of replicating the agent 
a to every partition that might need it for query phase processing. 
The amount of replication depends on the partitioning function and 
on the size of each agent's visible region. 

The reduce function receives as input all agents that are sent to 
a particular partition p. This includes the agents in p's owned re- 
gion, as well as replicas of all the agents that fall in ;7's visible 
region. The reducer will execute the query phase and compute ef- 
fect variables for all of the agents in its owned region (agent / s.t. 
P{i{s^l)) = p). This requires no communication, since the query 
phase of an agent in p's owned region can only depend on the 
agents in /?'s visible region, all of which are replicated at the same 
reducer. The reducer outputs agents with updated effect attributes 
to be processed in the next tick. 

Simulations with Non-Local Effects. The method above only 
works when all effect assignments are local. If an agent a makes 
an effect assignment to some agent b in its visible region, then it 
must communicate that effect to the reducer responsible for pro- 



map\-,a'-^) = [{p,{o\6,s\Q)) where ^(s^ eVR{p)] 
reduce' (p, [(oid/, s5,e)])= [{p,{o\6i,s\,e\)) yt s.t P{i{s\)) ^ p] 
Figure 9: Map and reduce functions with local effects only. 



map\{-,a'-^) = [{p,{o\6,s\Q)) where^(sO eVR{p)] 
reduce^P, [(oid,-,s5,e)]) = [(P(^(sO),(oid,-,s5,f?)) ,V/ s.t. f? ?^ 9] 
map2(^,fl) = ik,a) 

reduce2(p, [(oid/,s-,f^)]) = [{p,{o\6i,s\,®jfj)) yj s.t. oid/ = oid;] 
Figure 10: Map and reduce functions with non-local effects. 



cessing b. Figure [To| shows the complete map and reduce functions 
to handle simulations with non-local effect assignments. The first 
map function task is same as in the local effect case. Each agent is 
partitioned and replicated as necessary. As before, the first reduce 
function computes the query phase for the agents in p's owned set 
and computes effect values. In this case, however, it can only com- 
pute intermediate effect values f , since it does not have the effects 
assigned at other nodes. This first reducer outputs one pair for every 
agent, including replicas, that has its effects updated. These agents 
are keyed with the partition that owns them, so that all replicas of 
the same agent go to the same node. 

The second map function is the identity, and the second reduce 
function performs the aggregation necessary to complete the query 
phase. It receives all of the updated replicas of all of the agents 
in its owned region and applies the operation to compute the 
final effect values and complete the query phase. Each reducer will 
output an updated copy of each agent in its owned set. 

B. FORMAL SEMANTICS OF BRASIL 

In this section, we provide a more formal presentation of the 
semantics of BRASIL than the one presented in Section |4] In par- 
ticular, we show how to convert BRASIL expressions into monad 
algebra expressions for analysis and optimization. We also prove 
several results regarding effect inversion, introduced in Section[T2] 
and illustrate the resulting trade-offs between computation and 
communication. 

For the most part, our work will be in the traditional monad alge- 
bra. We refer the reader to the original work on this algebra |[6]|28] 
[35] |4T| for its basic operators and nested data model. We also use 
standard definitions for the derived operations like cartesian prod- 
uct and nesting. For example, we define cartesian product as 

fxg:= (I :/,2:g)oPAIRWITHioFLATMAP(PAIRWITH2) (1) 

For the purpose of readability, composition in ([TJ and the rest of 
our presentation, is read left- to-right; that is, (/og)(x) = g[f{x)). 

We assume that the underlying domain is the real numbers, and 
that we have associated arithmetic operators. We also add tradi- 
tional aggregate functions like COUNT and SUM to the algebra; 
these functions take a set of elements (of the appropriate type) and 
return a value. 

In order to simplify our presentation, we do make several small 
changes that relax the typing constraints in the classic monad alge- 
bra. In particular, we want to allow union to combine sets of tuples 
with different types. For this end, we introduce a special NIL value. 
This value is the result of any query that is undefined on the input 
data, such as projection on a nonexistent attribute. This value has a 
form of "null- semantics" in that values combined with NIL are NIL, 
and NIL elements in a set are ignored by aggregates. In addition, we 
introduce a special aggregate function GET. When given a set, this 
function returns its contents if it is a singleton, and returns NIL oth- 



erwise. Neither this function, nor the presence of NIL significantly 
affects the expressive power of the monad algebra ||40). 

B.l Monad Algebra Translation 

For the purpose of illustration, we assume that our simulation 
has only one class of agents, all of which are running the same 
simulation script. It is relatively easy to generalize our approach to 
multiple agent classes or multiple scripts. Given this assumption, 
our simulation data is simply a set of tuples {to,...Jn} where each 
tuple ti represents the data inside of an agent. Every agent tuple has 
a special attribute KEYwhich is used to uniquely identify the agent; 
variables which reference another agent make use of this key. The 
state-effect pattern requires that all data types other than agents be 
processed by value, so they can safely be stored inside each agent. 

We let T represent the type/schema of an agent. In addition to 
the key attribute, x has an attribute for each state and effect field. 
The value of a state attribute is the value of the field. The value 
of an effect attribute is a pair (1 : n,2 : agg) where ;i is a unique 
identifier for the field and AGG is the aggregate for this effect. 

During the query phase, we represent effects as a tuple (^ : N, ^ : 
N,v : a), where k is the key of the object being effected, e is the 
effect field identifier, and v is the value of the effect. As a shorthand, 
let p be this type. Even though effects may have different types, 
because of our relaxed typing, this will not harm our formalism. 

The syntax of BRASIL forces the programmer to clearly separate 
the code into a query script (i.e. run ( ) ) and an update script (the 
update rules). A query script compiles to a expression whose input 
and output are the tuple (1 : T^2 : {t},3 : {p}). The first element 
represents the active agent for this script; t' "extends" type i in that 
it is guaranteed to have an attribute for the key and each state field, 
but it may have more attributes. The second element is the set of 
all other agents with which this agent must communicate. The last 
element is the set of effects generated by this script. 

Let Q be the monad expression for the query script. Then the 
effect generation stage is the expression 

£l(g) = (id X ID)oNEST2 0MAp(2) (2) 

where Q is defined as 

e= (1:711, 2:712, 3:{})oeo(l:7ii, 2:713) (3) 

This produces a set of agents and the effects that they have gener- 
ated (which may or may not be local). In general, we will aggregate 
aggressively, so each agent will only have one effect for each pair 
/:, e. For the effect aggregation stage, we must aggregate the effects 
for each agent and inline them into the agent tuple. If we only have 
local effect assignments, then this expression is 0(g) o (£ where 

where the Sf are the state fields and the ej are the effect fields. How- 
ever, in the case where we have non-local effects, we must first 
redistribute them via the expression 

9^ = (7li X7l2)oMAP((l:7li,2:FLATTENoa;,,=;,,o7rKEY)) (5) 
So the entire query phase is £i{Q) o 9i o (£. Finally, for the update 
phase, each state st has an update rule which corresponds to an 
expression Ug- . These scripts read the output of the expression (E. 
Hence the query for our entire simulation is the expression 

£i{Q)o^oi&oU{Us,,...,UsJ (6) 

where the update phase is defined as 

H(t/,o, . . . , UsJ = map((key : 71key,^o : t/.o, • • • ,^« : ^.J) (7) 
The only remaining detail in our formal semantics is to define se- 
mantics for the query scripts and update scripts. Update scripts are 
just simple calculations on a tuple, and are straightforward. The 
only nontrivial part concerns the query scripts. A script is just a 



[[const X X = £]]y = (l:Xx([[^]]y), 2:712,3:713) 
[[effect Tx : /]]y = (1 :Xx((l : pW,2 : /)),2:7i2,3 :7i3) 
[[x <- £]]y = (1:712,2:712,3:7130 

((l:7li07lKEY,2:p(x),3:[[£]]y)oSNG)) 

[[R.x <- £]] = (1:712,2:712,3:7130 

((l:[[%,2:p(x),3:[[£]]y)oSNG)) 
[[if {E) [Bi] else {B2]]]v = 

(1 : 7l2,2 : 7l2,3 : SNGoa[[£]]^ oGETo [[5i]]y0 
SNGoa^p]]^oGETo[[52]]y) 
[[foreach {z x : E) {B]]]v = 
(1:712,2:712, 

3 :(l:7iioXx([[^]]y)oPAlRWlTH;„ 2:712,3:713) 
o FL ATM AP ( [[B]] V o 7l3 ) ) 

Figure 11: Translation for Common Commands 

sequence of statements ^o; . . . ;*S„ where each statement is a vari- 
able declaration, assignment, or control structure (e.g. conditional, 
f oreach-loop). See the BRASIL Language manual for more in- 
formation on the complete grammar |11|. It suffices to define, for 
each statement 5, a monad algebra expression [[S]] whose input and 
output are the triple (1 : T^ 2 : {t}, 3 : {p}); we handle sequences of 
commands by composing these expressions. 

Recall that our query script semantics depends on the visibility 
constraints in the script. We generalize the approach from Sec- 
tion 4.1 and represent visibility as a predicate V{x,y) which com- 
pares two agents. For any statement S, we let [[*S]] be its interpreta- 
tion with this constraint and [[5]]v be the semantics without. 

Before translating statements, we must translate expressions that 
may appear inside of them. The only nontrivial expressions are ref- 
erences; arithmetic expressions or other complex expressions trans- 
late to the monad algebra in the natural way. References return ei- 
ther the variable value, or the key for the agent referenced. Ignoring 
visibility constraints, for any identifier x, we define 

{PAIRWITH3 o Gtt^oti.otii =713071, 
E IS effect 
Oa;,^07lKBY=7l307l,OGET (8) 

7iio7ix Otherwise 

In general, for any reference E.x, we define 

[[E.x]] = (1 : 7i2 o c^^^^^iE]] o GET, 2 : 7i2, 3 : 713) o [[x]] (9) 

If we include visibility constraints, [[E]]v is defined in much the 
same way as [[£^]] except when E is an agent reference. In that case. 



[[x]]y = (l:iD,2:7i2oa^ 



I oGET)o(l:y,2:7l2)oSNG 



(10) 



o Gji^ o GET o 7l2 o 7ly^ 

This expression temporarily retrieves the object, tests if it is visible, 
and returns NIL if not. 

To complete our semantics, we introduce the following notation. 

• Xa{f) is an operation that takes a tuple and extends it with 
an attribute a having value /. It is definable in the monad 
algebra, but its exact definition depends on its usage context. 

• is an operations that takes two sets of effects and aggregates 
those with the same key and effect identifier. It is definable 
on in the monad algebra, but its exact definition depends on 
the effect fields in the BRASIL script. 

• p(x) is the effect identifier for a variable x. In practice, this is 
the position of the declaration of x in the BRASIL script. 



Given these three expressions, Figure [TT] illustrates the translation 
of some of the more popular statements in the monad algebra. In 
general, variable declarations modify the first element of the input 
triple (i.e. the active agent), while assignments and control struc- 
tures modify the last element (i.e. the effects). 

As we discussed in Section|42] this formalism allows us to apply 
standard algebraic rewrites from the monad algebra for optimiza- 
tion. For example, many of the operators in Figure[TT]- particularly 
the tuple constructions - are often unnecesary. They are there to 
preserve the input and output format, in order to facility composi- 
tion. There are rewrite rules that function like dead-code elimina- 
tion, in that they remove tuples that are not being used. One of the 
consequences is that many f oreach-loops simplify to the form 

F{E,B) = (1 : ID,2:£)oPAIRWITH2 0FLATMAP(5) (11) 

Note that this form is "half" of the cartesian product in ([T}; it joins 
a single value with a set of values. Thus when we simplify the 
f oreach-loop to this form, we can often apply join optimization 
techniques to the result. 

Another advantage of this formalism is that it allows us to prove 
correctness results. For example, the semantics of the visibility 
constraints in BRASIL is defined in terms of weak references. 
However, our implementation involves restricting the read set of 
each agent to those that are visible. It is a simple exercise to use our 
formalism to prove that theses approaches are equivalent. The fol- 
lowing result is the formal version of Theorem [T] from Section [4?T] 

Theorem 1 Let g be a BRASIL query script whose references 
are restricted by visibility predicate V. Then 

NEST2 0MAP([[2]J^) = Gy o NEST2 o MAP([[2]]) 
Furthermore, let 

0(F) = F o (7I2 X 7I3) o a;rio7rKEY=7r207i, o MAP(7li ) 
be the set of objects affected by an expression F. Then 

MAP((l:7ii,2:0([[e]]y)))oay = MAP((l:7ii,2:0([[e]]y))) (14) 

The significance of |T2] ) is that, instead of implementing the over- 
head of checking for weak references, we can filter out the agents 
that are not visible and eliminate any further visibility checking. 
The significance of |T4] l is that weak references insure agents can 
only affect visible agents. 

B.2 Effect Inversion 



(12) 



(13) 



As we saw in Section [42) there is an advantage to writing a 
BRASIL script so that all effects assignments are local. It may 
not always be natural to do so, as the underlying scientific models 
may be expressed in terms of non-local effects. However, in certain 
cases, we may be able to automatically rewrite a BRASIL program 
to only use local effects. In particular, if there are no visibility 
constraints, then we can always invert effect assignments to make 
them local-only. The following is the formal version of the result 
stated in Section I4l2l 



Theorem 2 Let 2 be a query script with no visibility constraints. 
There is a script Q' with only local effects such that [[g]] = [[Q^]] . 

Proof Sketch. Our proof takes advantage of the fact that ef- 
fect fields (as opposed to effect variables) may not be read dur- 
ing the query phase, and that effects are aggregated independent 
of order. We start with Q and create a copy script Qi. Within this 
copy, we remove all syntactically non-local effect assignments (e.g. 
E.x <- v). Some of these may actually be local in the semantic 
sense, but this does not effect our proof. 

We construct another copy Q2. For this copy, we pick a variable 
a that does not appear in Q. We replace every local state reference x 
in Q with a.x. We also remove all local effect assignments. Finally, 



we replace each syntactically non-local assignment E.x <- v with 
the conditional assignment if {E == this) {x <- v}. We then 
let 23 be the script 

foreach (Agent a : Extent <Agent>) { 22(a) } 
That is, 23 is the act of an agent running the script for each other 
agent, searching for effects to itself, and then assigning them lo- 
cally. The script 2i ^ 23 is our desired script. D 

Note that this conversion comes at the cost of an additional 
foreach-loop, as each agent simulates the actions of all other 
agents. Thus, this conversion is much more computationally ex- 
pensive than the original script. However, we can often simplify 
this to remove the extra loop. As mentioned previously, a foreach- 
loop can often be simplified to the form in (TT) . In the case of two 
nested loops over the same set E, the merging of these two loops is 
a type of self -join. That is, 
F{E,F{E,B)) = {\:lD,2:E)oPAlRWlTH20 

FLATMAP((l:ID,2:£;)oPAIRWITH20FLATMAP(5)) 
= (1 : ID,2:£,3 : £) 0PAIRWITH20 

FLATMAP(PAIRWITH3 oFLATMAP(5')) 
= (1 : ID,2:(£x£))oPAIRWITH2 0FLATMAP(5') 

where B' and B^' are B rewritten to account for the change in tuple 
positions. As part of this rewrite, may discover that that self -join is 
redundant in the expression B^^ and eliminate it; this is how we get 
simple effect inversions like the one illustrated in Section [42] 

In the case of visibility constraints, the situation becomes a little 
more complex. In order to do the inversion that we did the proof of 
Theorem [2] we must require that any agent ai that assigns effects 
to another agent ^2 must restrict its visibility to agents visible to ^2; 
that way ^2 can get the same results when it reproduces the actions 
of ai. This is fairly restrictive, as it suggests that every agent needs 
to be visible to every other agent. 

We can do better by introducing an information flow analysis. 
We only require that, for each non-local effect assigned to agent, 
that effect is computed using only information from agents visible 
to the one being assigned. However, this property depends on the 
values of the agents, and cannot (in generally) be inferred statically 
from the script. Thus it is infeasible to exploit this property in 
general. 

However, there is another way to invert scripts in the phase of 
visibility constraints. Suppose the visibility constraint for a script 
2 is a distance bound, such as d{x,y) < R. If we relax the visibility 
constraint for the script in the proof of Theorem [2] to d{x,y) < 2R, 
then the proof carries through again. We state this modified result 
as follows: 

Theorem 3 Let 2 be a query script with visibility constraint V . 
Let V' be such that V' (x,y) if and only if 3zV{x,z) A V{z,y). Then 
there is a script Q' with only local effects such that [[2]]y = \Q'\v'- 

Proof Sketch. The proof is similar to that of Theorem[2] The 
only difference is that we have to ensure that the increased visibility 
for Q' does not cause the weak references in a script to resolve to 
agents that would have otherwise evaluated to NIL. In the construc- 
tion of Q2, we use local constants to normalize the expressions so 
that any agent reference in the original script becomes a local con- 
stant. For example, suppose each agent has a field friend that is a 
reference to another agent. If we have a conditional of the form 



if (friend. X - x < BOUND) { 
then we normalize this expression as 

const agent temp = friend; 
if (temp.x - X < BOUND) { . 



} 



When then wrap these introduced constants with conditionals that 
test for visibiUty with respect to the old constraints. For example, 
the code above would become 



const agent temp 



if (temp.x - X < 



= (visible (this, friend) 

friend : null) ; 
BOUND) { ... } 



where visible is a method evaluating the visibility constraint and 
evaluates to NIL in the monad algebra. Given the semantics of NIL, 
this translation has the desired result. D 

C. DETAILS OF SIMULATION MODELS 

This section describes the two simulation models we have imple- 
mented for BRACE single-node performance and scalability exper- 
iments. 

Traffic Simulation. Traffic simulation is required to provide accu- 
rate, high-resolution, and realistic transportation activity for the de- 
sign and management of transportation systems. MITSIM, a state- 
of-the-art single-node behavioral traffic simulator has several dif- 
ferent models covering different aspects of driver behavior |47|. 
For example, during each time step, a lane selection model will 
make the driver inspect the lead and rear vehicles as well as the av- 
erage velocity of the vehicles in her current, left, and right lanes 
(within lookahead distance parameter p) to compute the utility 
function for each lane. A probabilistic decision of lane selection 
is then made according to the lane utility. If the driver decides to 
change her lane, she needs to inspect the gaps from herself to the 
lead and rear vehicles in the target lane to decide if it is safe to 
change to the target lane in the next time step. Otherwise, the vehi- 
cle following model is used to adapt her velocity based on the lead 
vehicle. The newly computed velocity will replace the old velocity 
in the next time step. Note that if the driver cannot find a lead or 
rear vehicle within p, she will just assume the distance to the lead or 
rear vehicle is infinite, and adjust the velocity according to a free- 
flow submodel. Because only limited information about MITSIM's 
driving models is available in the literature |47 1, we found it crucial 
to ensure that our implementation of MITSIM's lane changing and 
acceleration models was as accurate as possible. 

Therefore we validate consistency of the MITSIM model en- 
coded in BRASIL in terms of the simulated traffic conditions. One 
note is that since the MITSIM model hand-coded a nearest neigh- 
bor indexing for accessing the lead and rear vehicles, its lookahead 
distance actually varies for each vehicle. In our reimplementation 
we fix this distance to 200 in order to apply single-node spatial 
indexing. We compare lane changing frequencies, average lane ve- 
locity and average lane density with the segment length 20,000 on 
both simulators. The statistical difference is measured by RMSPE 
(Relative Mean Square Percentage Error), which is often used as a 
goodness-of-fit measure in the traffic simulation literature |9|. The 
results for all these three statistics are shown in Table[2] We can see 
that except for Lane 4's average density and changing frequency, all 
the other statistics demonstrate strong agreement between the two 
simulators. This exception is due to the fact that in the MITSIM 
lane changing model drivers have a reluctance factor to change to 
the right most lane (i.e.. Lane 4). As a result there are only a few ve- 
hicles on that lane (56.33 vehicles on average compared to 351.42 
on other lanes), and small lane changing record deviations due to 
the fixed lookahead distance approximation can contribute signifi- 
cantly to the error measurement. 

Fish School Simulation. Couzin et al. have built a behavioral fish 
school simulation model to study information transfer in groups 
of fish when group members cannot recognize which companions 
are informed individuals who know about a food source | |12j . This 



Lane 


Change Frequency 


Avg. Density 


Avg. Velocity 


LI 


8.93% 


7.42% 


0.007% 


L2 


5.57% 


10.38% 


0.007% 


L3 


7.67% 


9.38% 


0.007% 


L4 


21.37% 


19.72% 


0.007% 



Table 2: RMSPE for Traffic Simulation (LookAhead = 200) 

computational model proceeds in time steps, i.e., at each time pe- 
riod each fish inspects its environment to decide on the direction 
which it will take during the next time period. Two basic behav- 
iors of a single fish are avoidance and attraction. Avoidance has the 
higher priority: Whenever a fish is too close to others (i.e., distance 
less than a parameter a), it tries to turn away from them. If there 
is no other fish within distance a, then the fish will be attracted to 
other fish within distance p > a. The influence will be summed and 
normalized with respect to the current fish. Therefore, any other in- 
dividuals out of the visibility range p of the current individual will 
not influence its movement decision. In addition, informed individ- 
uals have a preferred direction, e.g., the direction to the food source 
or the direction of migration. These individuals will balance the 
strength of their social interactions (attraction and avoidance) with 
their preferred direction according to a weight parameter CO. 
Predator Simulation. Since both the traffic and the fish school 
simulations only use local effect assignments, we designed a new 
predator simulation, inspired by artificial society simulations |27 1. 
In this simulation, a fish can "spawn" new fish and "bite" other fish, 
possibly killing them, so density naturally approaches an equilib- 
rium value at which births and deaths are balanced. Since effect 
inversion is not yet implemented in the BRASIL Compiler, we pro- 
gram biting behavior either as a non-local effect assignment (fish 
assign "hurt" effects to others) or as a local one (fish collect "hurt" 
effects from others) in otherwise identical BRASIL scripts. 



